Models with continous covariance

Jesse Brunner

Another pond example

  • Imagine we have a set of ponds on the Palouse
  • We think productivity (g C / m2 / year) increases with pond area
  • However, nearby ponds share unobserved variables like:
    • atmospheric deposition
    • soil types
    • algal or phytoplankton communities
    • runoff

How do we account for nearness and thus shared unobserved correlations?

Simulate some ponds

It’s always a good strategy to start with simulation

# Positions in x and y directions
set.seed(104)
xs <- runif(n=10, 0, 10)
ys <- runif(n=10, 0, 10)
# pond sizes (NOT varying with position/nearness)
sizes <- rnorm(n=10, mean=3, sd=1)

Calculate distances between pairs

If we know their locations, we can find the distances between them

D <- dist(as.matrix(cbind(xs, ys)))
(D <- round( as.matrix(D), 2) )
      1    2    3    4     5    6    7     8    9   10
1  0.00 4.54 4.00 8.42  4.81 5.49 0.96  6.22 5.93 0.24
2  4.54 0.00 0.62 4.32  5.01 6.58 4.93  6.68 1.57 4.31
3  4.00 0.62 0.00 4.93  4.50 6.52 4.33  6.71 2.19 3.77
4  8.42 4.32 4.93 0.00  9.12 7.74 9.01  7.26 2.76 8.18
5  4.81 5.01 4.50 9.12  0.00 9.85 4.17 10.36 6.43 4.76
6  5.49 6.58 6.52 7.74  9.85 0.00 6.44  0.99 6.82 5.41
7  0.96 4.93 4.33 9.01  4.17 6.44 0.00  7.18 6.41 1.10
8  6.22 6.68 6.71 7.26 10.36 0.99 7.18  0.00 6.69 6.12
9  5.93 1.57 2.19 2.76  6.43 6.82 6.41  6.69 0.00 5.69
10 0.24 4.31 3.77 8.18  4.76 5.41 1.10  6.12 5.69 0.00

What does this matrix show us?

But now what?

Enter the continuous covariance

So far we have thought of the covariance in (deviations from) parameters of distinct groups, \(i = 1,2,3,...\)

Here we will consider covariances in shared deviations of groups (or places) based on some metric of distance.

\(\rightarrow\) Want a function where covariance declines with distance between two points or ponds (or whatever)

The Gaussian version of covariance

\[ K_{i,j} = \eta^2 e^{-\rho^2 D_{i,j}^2} + \delta \] - $^2 $ is the maximum covariance at \(D = 0\) - \(\rho\) describes the rate of decay with \(D\) - \(\delta\) is the minimum covariance possible (maybe 1e-9) to avoid zeros

L2 <- function(x, eta, rho, delta){
  eta^2 * exp(-rho^2 * x^2) + delta
}

NB: There are lots of other possible covariance functions

See: https://biol609.github.io/lectures/autocorr.html

Simulate the covariances among ponds

We can use the Gaussian process to calculate covariances based on distances.

L2 <- function(x, eta, rho, delta){
  eta^2 * exp(-rho^2 * x^2) + delta
}
eta <-  0.7 # eta^2 is maximum covariance when D=0
rho <- 0.5 # rate of decline in covariance
delta <- 0
round( Sigma <- L2(D, eta, rho, delta), 2 )
      1    2    3    4    5    6    7    8    9   10
1  0.49 0.00 0.01 0.00 0.00 0.00 0.39 0.00 0.00 0.48
2  0.00 0.49 0.45 0.00 0.00 0.00 0.00 0.00 0.26 0.00
3  0.01 0.45 0.49 0.00 0.00 0.00 0.00 0.00 0.15 0.01
4  0.00 0.00 0.00 0.49 0.00 0.00 0.00 0.00 0.07 0.00
5  0.00 0.00 0.00 0.00 0.49 0.00 0.01 0.00 0.00 0.00
6  0.00 0.00 0.00 0.00 0.00 0.49 0.00 0.38 0.00 0.00
7  0.39 0.00 0.00 0.00 0.01 0.00 0.49 0.00 0.00 0.36
8  0.00 0.00 0.00 0.00 0.00 0.38 0.00 0.49 0.00 0.00
9  0.00 0.26 0.15 0.07 0.00 0.00 0.00 0.00 0.49 0.00
10 0.48 0.00 0.01 0.00 0.00 0.00 0.36 0.00 0.00 0.49

Simulate data: observations | covariances

a <- 3 # intercept
b <- 1 # slope
mu <- a + b*(sizes-2.5) # predicted values

p <- MASS::mvrnorm(n=1, 
                   mu=mu, 
                   Sigma = Sigma)

A model: pond size only

dat <- list(p=p, ID=1:10, size=sizes-2.5, D=D)

m1 <- ulam(
  alist(
    p ~ normal(mu, sig),
    mu <- a + b*size,
    
    # priors
    a ~ normal(2,1), 
    b ~ normal(0,1), 
    sig ~ exponential(1)
  ), data = dat, chains=4, cores=4
)
         mean        sd      5.5%     94.5%     rhat  ess_bulk
a   3.0510305 0.2214435 2.6780275 3.3842765 1.002649 1066.5209
b   0.8031822 0.2410533 0.4083819 1.1671399 1.003779 1148.8811
sig 0.6365006 0.1846148 0.4120160 0.9541075 1.003109  975.4565

A model: pond size + distance

m2 <- ulam(
  alist(
    p ~ normal(mu, sig),
    mu <- a + b*size + k[ID],
    
    # priors for covariances
    vector[10]:k ~ multi_normal( 0 , SIGMA ),
    matrix[10,10]:SIGMA <- cov_GPL2( D , etasq , rhosq , 0.01 ),
    etasq ~ dexp( 2 ),
    rhosq ~ dexp( 0.5 ),
    
    # other priors
    a ~ normal(2,1), 
    b ~ normal(0,1), 
    sig ~ exponential(1)
  ), data = dat, chains=4, cores=4
)
           mean        sd       5.5%     94.5%     rhat ess_bulk
etasq 0.3638863 0.3332467 0.04298979 0.8975960 1.015973 230.2829
rhosq 1.6102677 1.7449102 0.03908204 5.0672641 1.006698 366.1043
a     3.0234153 0.3602153 2.58626320 3.4331364 1.013847 282.3691
b     0.9127810 0.2280982 0.56335727 1.2488605 1.005795 504.3920
sig   0.3187199 0.2213766 0.06584913 0.7106205 1.038326 137.9153

Estimates of the decay in covariance

black is prior and blue is posterior

Estimates of the decay in covariance

black is prior and blue is posterior red is True relationship

Comparing estimates of the relationship between pond size and productivity

black is m1 (without space) and blue is m2 (distances)